Documentation
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Manage Models
Model Info
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Manage Models
Model Info
Configuring the Model
You can customize both inference-time and load-time parameters for your model. Inference parameters can be set on a per-request basis, while load parameters are set when loading the model.
Set inference-time parameters such as temperature
, maxTokens
, topP
and more.
result = model.respond(chat, config={
"temperature": 0.6,
"maxTokens": 50,
})
See LLMPredictionConfigInput
in the
Typescript SDK documentation for all configurable fields.
Note that while structured
can be set to a JSON schema definition as an inference-time configuration parameter
(Zod schemas are not supported in the Python SDK), the preferred approach is to instead set the
dedicated response_format
parameter, which allows you to more rigorously
enforce the structure of the output using a JSON or class based schema definition.
Set load-time parameters such as the context length, GPU offload ratio, and more.
.model()
The .model()
retrieves a handle to a model that has already been loaded, or loads a new one on demand (JIT loading).
Note: if the model is already loaded, the given configuration will be ignored.
import lmstudio as lms
model = lms.llm("qwen2.5-7b-instruct", config={
"contextLength": 8192,
"gpu": {
"ratio": 0.5,
}
})
See LLMLoadModelConfig
in the
Typescript SDK documentation for all configurable fields.
.load_new_instance()
The .load_new_instance()
method creates a new model instance and loads it with the specified configuration.
import lmstudio as lms
client = lms.get_default_client()
model = client.llm.load_new_instance("qwen2.5-7b-instruct", config={
"contextLength": 8192,
"gpu": {
"ratio": 0.5,
}
})
See LLMLoadModelConfig
in the
Typescript SDK documentation for all configurable fields.
On this page
Inference Parameters
Load Parameters
Set Load Parameters with .model()
Set Load Parameters with .load_new_instance()